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A method and apparatus for processing a geometric relation- 
ship between the image motion of pairs of points over multiple 
image frames representing a three-dimensional scene. This rela- 
tionship is based on the parallax motion of points with respect to 
an arbitrary planar surface, and does not involve epipolar geome- 
try. A constraint is derived over two frames (504) for any pair of 
points (506), relating their projective structure (with respect to the 
plane) based only on their image coordinates and their parallax 
displacements. Similarly, a 3D-rigidity constraint between pairs 
of points over multiple frames is derived. Applications disclosed 
for these parallax-based constraints include recovery of 3D scene 
structure (512), detection of moving objects in the presence of 
camera induced motion (510), and synthesis of new camera views 
based on a given set of views (514). Moreover, the approach can 
handle difficult situations for 3D scene analysis, e.g., where there 
is only a small set of parallax vectors, and in the presence of in- 
dependently moving objects. 
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ynri r. nv TFCE INVENTION 
The invention generally relates to image processing systems and, more 
15 partic^, to a me*od and apparatus for processing the parallax geometry of 
pairs of points within three-dimensional scene. 
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BACKGROUND OF THE INVENTION 
The analysis of three dim ensional scenes from image sequences has a 
number of goals. These goals include, but are not limited to: (i) the recovery of 
5 3D scene structure, (ii) the detection of moving objects in the presence of camera 
induced motion, and (iii) the synthesis of new camera views based on a given set 
of views. 

The traditional approach to these types of problems has been to first 
recover the epipolar geometry between pairs of frames and then apply that 
10 information to achieve the above-mentioned goals. However, this approach is 
plagued with the difficulties associated with the recovery of the epipolar 
geometry. 

Recent approaches to 3D scene analysis have attempted to overcome some 
of the difficulties in recovering the epipolar geometry by decomposing the motion 

15 into a combination of a planar homography and residual parallax. The residual 
parallax motion depends on the projective structure, and the translation 
between the camera origins. While these methods remove some of the 
ambiguities in estimating camera rotation, they still require the explicit 
estimation of the epipole itself, which can be difficult under many 

20 circumstances. In particular, epipole estimation is ill-conditioned when the 
epipole lies significantly away from the center of the image and the parallax 
motion vectors are nearly parallel to each other. Also, when there are only a 
small number of parallax vectors and the scene contains moving objects, these 
objects incorrectly influence the estimation of the epipole. 

25 In general, the treatment of multipoint geometry assumes that the scene 

is static and relies on the fact that almost all points selected for the shape 
estimation are known to belong to a single rigid body. In its current form, this 
class of methods has drawbacks, for example, these methods do not address the 
problem of shape recovery in dynamic scenes, in particular when the amount of 

30 image motion due to independent moving object is not negligible. 

SIIMMARY OF THE INVENTION 
The present invention involves a technique for image processing which 
receives a plurality of two dimensional images representative of a scene, 
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computes a parallax-related constraint for a pair of points within the plurality of 
images, applies the parallax-related constraint to a plurality of points within the 
plurality of images in order to generate information representative of whether a 
given point within the plurality of images is consistent with the parallax-related 
constraint; and uses the generated information for an image processing task 
related to the received plurality of images. 

BRIEF DESCRIPTION OF THE DRAWINGS 
The teachings of the present invention can be readily understood by 

considering the following detailed description in conjunction with the 

accompanying drawings in which: 

FIG. 1 depicts a block diagram of a three-dimensional scene analysis 

system; 

FIG. 2 provides a geometric illustration of the planar homography plus 
parallax decomposition; 

FIG. 3 depicts a geometric illustration of a pairwise parallax constraint; 

FIG. 4 depicts a geometric illustration of a scene where epipole estimation 
is unreliable, but the relative structure constraint can be reliably used to recover 
relative structure within the scene; 

FIG. 5 depicts a flow diagram of a routine which utilizes the parallax 
constraint; 

FIGS. 6 and 7 provide an illustration of parallax geometry and the dual of 
the epipole; and 

FIGS. 8a-g depict a series of images used and produced during shape 
recovery that relies on a single parallax vector. 

FIGS. 9a-b illustrates reliable detection of 3D motion inconsistency with 
sparse parallax information using a ball and tree. 

FIGS. lOa-f are a series of images illustrating moving object detection 
relying on a single parallax vector. 

FIGS, lla-f are a series of images, like FIGS. lOa-f, illustrating moving 
object detection relying on a single parallax vector. 

To facilitate understanding, identical reference numerals have been used, 
where possible, to designate identical elements that are common to the figures. 
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DETAILED DESCRIPTION 

Overview 

The present invention uses, in various image processing tasks, the 
geometric relationships between the parallax displacements of two or more 
5 points wi thin two or more images. The invention applies these relationships to 
various image processing tasks such as (i) the recovery of 3D scene structure, (ii) 
the detection of moving objects in the presence of camera induced motion, and 
(iii) the synthesis of new camera views based on a given set of views. 

An important advantage of the present invention is the ability to 
10 effectively operate in difficult image processing situations (e.g., when there are a 
QTnall number of parallax vectors, when the epipole estimation is ill-conditioned, 
an d in the presence of moving objects). The present invention does not require 
the recovery of epipoles during processing; therefore, it applies to situations 
when the accurate recovery of the epipole is difficult. Moreover, the present 
15 te chni ques for 3D scene analysis are applicable when only a small set of 

parallax vectors is available. In fact, the planar parallax of a single point can be 
used as a reference to recover the structure of the entire scene, and to determine 
whether other points belong to the static scene or to a moving object. 

The results presented herein are expressed in terms of the residual 
20 parallax displacements of points after canceling a planar homography. The 
decomposition of image motion into a homography plus parallax was shown to 
be more robust, yet more general, than the decomposition into rotation plus 
translation. Techniques for estimating the planar homography from pairs of 
images is described in J.R. Bergen, P. An an dan, KL J. Hanna, and R. Hingorani, 
25 "Hierarchical model-based motion estimation" European Conference on 

Computer Vision, pages 237-252, Santa Margarita Iigure, May 1992, which is 
herein incorporated by reference. 

In the present invention, a parallax-based structure constraint is derived 
that relates the projective structure of two points to their image positions and 
30 their parallax displacements. By eliminating the relative projective structure of 
a pair of points between three frames, a constraint, referred to as the parallax- 
based rigidity constraint, on the parallax displacements of two points moving as 
a rigid object over those frames is derived. 
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Also presented below is an alternative way of deriving the parallax-based 
rigidity constraint. In the alternative derivation, the constraint is deter min ed 
geometrically rather than algebraically. Doing so leads to a simple and intuitive 
geometric interpretation of the multiframe rigidity constraint and to the 
derivation of a dual point to the epipole. 

Examples of applications of these parallax-based constraints to solving 
three important problems in 3D scene analysis are also described. The 
applications include: (i) the recovery of 3D scene structure, (ii) the detection of 
moving objects in the presence of camera induced motion, and (iii) the synthesis 
of new camera views based on a given set of views. 

Finally, the generalization of the constraint to full image motion by 
including the planar homography component is described. 

Exemplary Embodiment 

Turning to the figures, FIG. 1 depicts a block diagram of a 
three-dimensional scene analysis system 100 suitable for implementing of the 
present invention. The system contains an image source 102, a computer 
system 104, one or more output devices 106 and one or more input devices 108. 
The image source 102 can be a video camera, an infrared camera, or some other 
sensor that generates a series of two-dimensional images representing a scene. 
Alternatively, the image source can be a storage device such as a video tape 
recorder, disk drive or some other means for storing sequential images 
representing a scene. The system generally processes digital images; therefore, 
if the image source produces analog images, a digitizer (not shown) is used 
between the image source and the computer system. 

The general purpose computer 104 facilitates image processing, scene 
analysis and image display. Specifically, the computer system contains a data 
buffer 110, a central processing unit (CPU) 112, support circuitry 114, random 
access memory (RAM) 116, read only memory (ROM) 118, and a display 
driver 120. Additionally, a user interacts with the computer system through one 
or more input devices 108 such as a keyboard, mouse, trackball, touchpad, and 
the like. Also, the computer systems displays the images and various graphical 
interface displays (screens) on the output display device 106 such as a computer 
monitor. Alternatively, the computer system may also interact with other 
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output display devices such as a printer to provide a "hard copy" of any display 
that appears on the computer monitor. 

The data buffer 110 provides data rate equalization (frame buffering) 
between the image source and the CPU. Typically, this buffer is a first-in, 
5 first-out (FIFO) buffer. Such buffers are typically used to provide a constant 
data rate to the CPU while providing flexibility in the data rates that can be 
generated by an image source. 

The CPU 112 is typically a general purpose processor such as a PowerPC, 
Pentium, or some other generally available processor. PowerPC is a registered 
1 0 trademark of International Business Machines of Armonk, New York and 
Pentium is a registered trademark of Intel Corporation of Santa Clara, 
California. Since the software implementation of the present invention is not 
required to execute on any specific processor, the routines of the present 
invention can be executed upon any type of processor or combination of 
15 processors in a parallel processing computer environment. In addition, rather 
than using a general purpose computer, the scene analysis may be accomplished 
within a real-time image processor. 

The CPU 112 operates in conjunction with various other circuits such as 
RAM 116, ROM 118 and support circuitry 114 such as co-processor(s), clock 
20 circuits, cache, power supplies and other well-known circuits. The operation and 
interrelationship of these various computer components is well-known in the art 
and does not require further explanation. The display driver 120 may be a video 
card, printer driver or other common driver software or hardware as required by 

the output device(s) 106. 
25 The RAM 116 stores the software implementation of the present 

invention. Typically, the routines of the invention are stored in a mass storage 
device (not shown) and recalled for temporary storage in the RAM 116 when 
executed by the CPU 112. In FIG. 1, the invention is embodied in a 
three-dimensional scene analysis routine 122. 



30 



A Parallax-Based Constr ^f* nn Pairs of Points 
A constraint on the parallax motion of pairs of points between two image 
frames that represent a three-dimensional scene as imaged by a video camera is 
described below. The derived constraint can be used to recover a relative 3D 
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structure invariant of two points from their parallax vectors alone, without any additional 
information, and in particular, without requiring the recovery of the camera epipoles. 

The parallax constraint is extended to multiple frames to form a rigidity constraint on 
any pair of image points (similar to the trilinear constraint). Namely, inconsistencies in the 
3D motion of two points that belong to independently moving 3D objects can be detected 
based on their parallax displacements among three (or more) frames, without the need to 
estimate any 3D information. 

To derive the parallax constraint, first, the decomposition of the image motion into a 
homography (i.e., the image motion of an arbitrary planar surface) and residual parallax 
displacements is described. This decomposition is well-known in the art. 

1. The Planar Pat^Hay T^fion^ 

FIG. 2 provides a geometric interpretation of the planar parallax. Let P ~ (X,Y,Z) T 
and ^ =(^ yl > z ) denote the Cartesian coordinates of a scene point with respect to 
different camera views, respectively. Let * = ( x >yf and x = (x\y) T denote ^ 
corresponding coordinates of the scene point as projected onto the image planes at the two 

camera positions, respectively. Let p=(x,y,l) T =— Pandp=(x* y y\\) 7 =— p 1 denote the 

Z Z 

same point in homogeneous coordinates, respectively. 

Let S be an arbitrary planar surface and A the homography that aligns the planar 
surface S between the second and first frame (i.e., for all points ? € S * P= A?), it can be 
shown that the image motion can be written (in homogeneous coordinates) as: 

P = P„+k(p w -e) (1) 

In FIG. 2, P» is the image point in the first frame which corresponds to warping P 

by the homography A :p w = "i7rr (a 3 being the third row of A ). When P is on the 3D 

"3 P 

planar surface 5, then P~=P Otherwise, the remaining displacement between A> md P is 
proportional to the 3D projective structure (k) 
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+of P with respect to the planar surface S. Also, z * is the 

epipole in the first frame. * = Vx> T ^ 7 z ) is the 3D translation between the two 
frames as expressed in the coordinate system of the first frame, d N is the distance 
of P from the plane S, and d' is the distance to the plane from the second camera 
5 center. 

Shape Invariance: Note that the projective structure k depends both on the 3D 

mm ht\ 2 O X. 

structure and on the camera translation T. However ^ = rf ^ for any two 
points P 1 andP2 is view-independent, and is therefore a relative structure 
invariant with respect to camera motion. 
10 When T z = 0, Eq. (1) can be rewritten in terms of p ~ as: 

p = p w -rf, (2) 

where r= Once again, = -f-£ provides the same relative structure 
invariant. 

Rewriting Eq. (1) in the form of image displacements yields, in 
IS homogeneous coordinates: 

p -p = (p-p w )-k(p w -e), (3) 

and in 2D image coordinates: 

5 = (x-x) = (x -r w )-*(r„ -*.)=«, + A (4) 

where * = signifie s the image point where £ = < x '* ^ intersects the image 

20 plane of the first frame, signifies the 2D image coordinates of the epipole in the 
first fame, ^* denotes the planar part of the 2D image motion (the homography 
due to S), and P denotes the residual parallax 2D motion. 

Similarly, when T * = 0 (Eq- (2)): 

« = (x -x) = (x =5, + M (5) 

25 where t = <T x ,Ty) T . 

Eqs. (4) and (5) provide the form of parallax notation that will be used in 
the rem ainin g of the paper. Note that they are expressed in terms of 2D image 
coordinates. Although we derived the parallax notation differently for 
T t = 0 and T t * 0 fa ey be unified and treated as a single case in the following 

30 sections. 
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2. Th^ P^r^lla^-pa^ Structure Constraint 

Let x i = *2 =(^2^2) be two image points. We start by analyzing the 

case when T z * 0 . From Eq. (4) we have: 

Y 

Let ^ denote the part of the projective structure k which is point-dependent, i.e. , 
T 

= r/^( , = l>2), then: T 

rf (8) 



10 Therefore, T 



(9) 



This last step eliminated the epipole** . Eq. (9) entails that the vectors on both sides of the 

T 

equation are parallel. Since Y \Y 2 ~ is a scalar, we get: 

d 

15 where ~ ^ x **\ This leads to the pairwise parallax constraint 

(^2-^1)^^)1 = 0, 

where v ^ signifies a vector perpendicular to v . From Eq. (6), = ^' + A>»' = *> 2 
Hence, we can rewrite the parallax constraint in terms of the pixel coordinates of the two 
points and the parallax vectors as: 
20 " JhYifi^ + A/ij , ) x = 0, (1Q) 



where 



Ax 5l =x 2 -x t , and = Jh~A 



Eq. (10) provides a constraint on the structure parameters (gi and g2) of the two 
points, using only their image (pixel) coordinates and their parallax vectors. 

gl and g2 depend on the second-view. However, if x i is chosen so that its parallax 

25 vector A * °0 e >^ , then the parallax constraint can be rewritten in terms of the 

Y d Z 

relative structure invariant (with respect to the first frame) of the two points, = T a "^ : 
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When = 0, a constraint stronger than Eq. (11) can be derived: 

(A-?-^)=°. 

Y\ 

however, Eq. (11). still holds. This is important, as we do not have a prior 
knowledge of T z to distinguish between the two cases. 

Therefore, given only the parallax vectors of the two image points, their 
relative structure ^ can be recovered from Eq. (11): 

Yx A'C^ + A^,)./ 
or more concisely, _ fh i^w^x 

Yi A r (Ax5. u )i" (12) 

FIG. 3 displays the constraint geometrically. As shown in this figure, 

— = 4S where AC and AB are defined in FIG. 3. 
Yi AB 

The benefit of the constraint in Eq. (12) is that it provides this information 
directly from the positions and parallax vectors of the two points, without the 
need to go through the computation of the epipole, using as much information as 
one point can give on another. 

FIG. 4 graphically shows an example of a configuration in which estimating 
the epipole is very unreliable, whereas, estimating the relative structure directly 
from Eq. (12) is reliable. 

3. The Parallax -Based Rigidity Constraint 

In this section, how the parallax-based structure constraint can be 
extended to multiple frames, to form a rigidity constraint on pairs of image points 
that contains neither structure parameters nor camera geometry is described. 
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Ripiditv Over Multiple Frames; 

Let x i = ( X i>-K ) and x 2 = (^2,^2 ) be two image points in the first (reference) frame. 
Let ^» denote the 2D planar parallax motion of x » from the first frame to frame;. 
From Eq. (6) we see that x * can be computed by >( 7 = *> 2 )- 

5 Using the structure invariance constraint of Eq. (12), for any two frames j 

and ft we get: 

Multiplying by the denominators yields the rigidity constraint of the two 
points over three frames (reference frame, frame j, and frame ft): 
, 0 (M* r (A*Oi)<£/ '(^)i)-(^ r (Ar^)iX^ r (Ax lS?M )i) = 0. (13) 

Based on the planar parallax motion trajectory of a single image point (e.g., 
x \ ) over several frames, the rigidity constraint (13) states a constraint on the 
planar parallax motion trajectory of any other point (e.g., ^ ). The rigidity 
constraint of Eq. (13) can therefore be applied to detect inconsistencies in the 3D 
15 motion of two image points (i.e., indicate whether the two image points are 

projections of 3D points belonging to a same or different 3D moving object) based 
on their parallax motion among three (or more) frames alone, without the need to 
estimate either camera geometry or structure parameters. 

Bieiditv Over Multiple Points: 

20 Instead of considering pairs of points over multiple frames, an alternative 

is to consider multiple points over two frames to come up with a different form of 
the rigidity constraint. 

Let * = >> x * = X andf 3 = <x s ) be ibnB ^ge points ^ Ae 
first (reference) frame. Let M denote the 2D planar parallax motion of x < from the 
25 first frame to any other frame, and x * = x * + ^>( 7 = 1,2, 3) 

Using the shape invariance constraint of Eq. (12): 

y,_ ft T (Ar fc .)x 

r 2 ^ r (Ax, 
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Using the equality 

and multiplying by the denominators, we get the rigidity constraint for three 
5 points over a pair of frames: 

A r (^,)iA r (toH,)iM r (A»*.)i -^/(^* M )xM r (A«* u )x^ r (Ar 8ii ) 1 = 0 (14) 
The benefit of the rigidity constraint (14) is in the fact that it provides this 
information directly from the positions and parallax vectors of the three points, 
without the need to go through the unstable computation of the epipole, using as 
10 much information as two points can give on the third. 

B. Parallax Geometry and an En inole Dual 

In this section, a different way of deriving the parallax-based rigidity 
constraint is described. Rather than deriving the constraint algebraically, the 
alternative derivation uses geometry. This leads to a simple and intuitive 
1 5 geometric interpretation of the multiframe rigidity constraint, and to the 

derivation of a dual point to the epipole. Although this distinct image point (the 
epipole dual) is pointed out, the rigidity constraint itself does not require the 
estimation of the dual epipole, just as it did not require the estimation of the 
epipole itself. 

20 FIG. 6 illustrates the 3D geometric structure associated with the planar 

parallax of pairs of points between two frames. In this figure, S is the planar 
surface, and P and Q are the two scene points. As in the case of FIG. 2, P w and 
Q w are the intersections of rays OT and O'Q with the plane S. Similarly, the 
points P* and on the reference image are the projections of P w and Qw^and are 

25 therefore the points to which the planar homography transforms P a ^ d? 
respectively. In other words, = A P = A * ' Below, P~ and *• are 

referred to as "warped points." 

Let R be the intersection of the fine connecting P and Q with the plane S. 
Note that the points P, Q, R, P w and Q w are co-planar. Hence, P w and Qw and R 

30 are colinear. Of course, P, Q and R are colinear by constructions. 
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Let r be the projection of R on the reference image plane. Since 
ReS,r=r w . Since p,q,r andp„,q w> r ar e the image projection of the colinear 
warped points P, Q, R, we can infer the following: 
p w , q w and r and colinear and P* 4 * are colinear. 

In other words, the line connecting P~ an&q w t j ie ], n ft connecting 
P and 9 intersect at f , the image of the point R. 

Note that the point r does not depend on the second camera view. 
Therefore, if multiple views are considered, then the lines connecting the warped 
points P~= A 'P and # = ^ q J (for ^ frame jy meet at - for ^ such 

FIG. 7 illustrates the convergence of the lines. Referring to that figure, 
since the lines qC, pB and rA are parallel to each other and intersect the lines 
qpr and CAB: 

Pr BA r / 

Similarly, 

gr FD y o 
pr~ ED~ r / 

Hence 

gr CA FD 
pr " BA " ED ' 

This is the same as the rigidity constraint derived in Eq. (13). Note, however, the 
rigidity constraint itself does not require the estimation of the point of 
convergence r , just as it does not require the estimation of the epipole. 

The point r is the dual of the epipole: the epipole is the point of 
intersection of multiple parallax vectors between a pair of frames, i.e., the point of 
intersection of all lines connecting each image point with its warped point 
between a pair of frames. Whereas, the dual point r is the point of intersection of 
all lines connecting a pair of points in the reference image and the corresponding 
pair of warped points from all other frames. 

C Applications Of Pair wise Parallax Geometry 

In this section, how pairwise parallax geometry, in its various forms, 
provides an approach to handling some well-known problems in 3D scene 
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analysis, in particular: (i) moving object detection, (ii) shape recovery, (iii) new 

view generation. It is shown that the parallax constraints provide the capability 

for solving these problems without having to first solve a "more complex" 
» 

problem. 

5 FIG. 5 depicts a flow diagram of a process for performing 

three-dimensional scene analysis which uses the shape constraint of Eq. (12) (or 
equivalently Eq. (11)) and the rigidity constraints of Eqs. (13) and (14). The 
process is represented as an executable software routine 500 that begins at 
step 502 and proceeds to step 504. At step 504, the routine is provided with a 

10 plurality of input images. At step 506, the routine computes a planar parallax 
motion (e.g., M) for each point in the image. Then, at step 508, for each motion 
trajectory deter min ed in step 506, one or more of the constraints of Eqs. (11), 
(12), (13), and (14) are applied with respect to all other points. The routine 500 
uses the information from step 508 (e.g., information representing some image 

15 points as being consistent with the constraint and some image points being 
inconsistent with the constraint) within one or more image processing tasks. 
These tasks include, but are not limited to, moving object detection (step 510), 
shape recovery (step 512, and new view generation (step 514). Each of these 
illustrative applications of the inventive technique are described below. 

20 

1. T^gtitn fltHn p r PI ;mar Parallav Motion 

The estimation of the planar parallax motion used for performing the 
experiments presented in this section was done using two successive 
computational steps: (i) 2D image alignment to compensate for a detected 
25 planar motion (i.e., the homography in the form of a 2D parametric 

transformation, and (ii) estimation of residual image displacements between the 
aligned images (i.e., the parallax). Such a system is disclosed in U.S. provisional 
patent application serial number 60/0 1 1,496 filed 2/12/96 (Attorney Docket No. 
12040) and herein incorporated by reference. 



30 



2. Sha pe Recovery 
The parallax-based structure constraint (Eq. (12)) can be used to recover a 
3D relative structure between pairs of points directly from their parallax 
vectors. T his implies that the structure of the entire scene can be recovered 
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relative to a single reference image point (with non-zero parallax). Singularities 
occur when the denominator of the constraint (Eq. (12)) tends to zero, i.e., for 
points that lie on the line passing through the reference point in the direction of 
its parallax vector. 

5 FIGS. 8a-g show an example of recovering structure of an entire scene 

relative to a single reference point. Three views obtained by a hand-held camera 
of a rug covered with toy cars and boxes, were used as the source data whose 
heights were measured. The detected 2D planar motion was that of the rug 
(FIG. 8d). A single point with non-zero planar parallax was selected as a 

10 reference point for estimating relative shape (FIG. 8e). FIG. 8f shows the 

recovered relative structure of the entire scene from two frames (FIGS. 8b and 
8c). Regions close to the image boundary were ignored. The obtained results 
were quite accurate except along the singular line in the direction of the 
parallax of the reference point. The singular line is evident in FIG. 8f. 

15 The singularities can be removed and the quality of the computed 

structure can be improved either by using multiple frames or by using multiple 
reference points: 

•Multiple frames: Singularities are removed by using multiple frames if 
their epipoles are non-colinear. Non-colinearity of epipoles can be detected 
20 through change in the parallax direction of the reference image point. 

•Multiple points: Singularities can be removed by using additional 
reference image points. An additional reference point should be chosen so that: 
CD it does not lie on the singular line (i.e., in the direction of the parallax vector) 
of the first reference point (it should preferably be chosen on the line 
25 perpendicular to that), and (ii) the additional reference point should be first 
verified to move consistently with the first reference point through the rigidity 
constraint of Eq. (13) over a few frames. 

Combinations of multiple reference points over multiple frames can also 
be used. FIG. 8g shows an example of recovering structure of an entire scene 
30 from three frames relative to the same single reference point as in FIG. 8f. The 
singular line in FIG. 8f has disappeared. 

The ability to obtain relatively good structure information even with 
respect to a single point has several important virtues: 
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•It does not require the estimation of the epipole, and therefore, does not 
require dense parallax information. 

•Unlike conventional techniques for recovering structure, it provides the 
capability to handle dynamic scenes, as it does not require having a collection of 
5 image points which is know a priori to belong to the single 3D moving object. 

•Since it relies on a single parallax vector, it provides a natural 
continuous way to bridge the gap between 2D cases, that assume only planar 
motion exists, and 3D cases that rely on having parallax data. 

10 3 Moving O bii»r.t Detection 

A number of techniques exist to handle multiple motions analysis in the 
simpler 2D case, where motions of independent moving objects are modeled by 
2D parametric transformation. These methods, however, detect points with 
planar parallax motion as moving objects, as they have a different 2D image 
15 motion than the planar part of the background scene. 

In the general 3D case, the moving object detetion problem is much more 
complex, since it requires detecting 3D motion inconsistencies. Typically, this is 
done by recovering the epipolar geometry. Trying to estimate epipolax geometry 
(i.e., camera motion) in the presence of multiple moving obejcts, with no prior 
20 segmentation, is extremely difficult. This problem becomes even more acute 
when there exists only sparse parallax information. 

FIG. 9a graphically displays an example of a configuration in which 
estimating the epipole in the presence of multiple moving objects can produce 
relatively large errors, even when using clustering techniques in the epipole 
25 domain as suggested by some conventional techniques. Relying on the epipole 
computation to detect inconsistencies in 3D motion fails in detecting moving 

objects in such cases. 

In FIG. 9a, the camera is translating to the right. The only static object 
with pure parallax motion is that of the tree. The ball is falling independently. 
30 The epipole may be incorrectly computed as e. The false epipole e is consistent 

with both motions. 

The parallax rigidity constraint (Eq. 13) can be applied to detect 
inconsistencies in the 3D motion of one image point relative to another directly 
from their "parallax" vectors over multiple (three or more) frames, without the 
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need to estimate either camera geometry or shape parameters. This provides a 
useful mechanism for clustering (or segmenting) the "parallax" vectors (i.e., the 
residual motion after planar registration) into consistent groups belonging to 
consistently 3D moving objects, even in cases such as in FIG. 9a where the 
5 parallax information is minimal, and the independent motion is not negligible. 
FIG. 9b graphically illustrates how the rigidity constraint of Eq. (13), when 
applied, detects the 3D inconsistency over three frames. 

FIGS. lOa-f shows an example of using the rigidity constraint of Eq. (13) 
to detect 3D inconsistencies. In this sequence, the camera is in motion 

1 0 (translating from left to right), inducing parallax motion of different 

magnitudes on the house, road, and road-sign. The car moves independently 
from left to right. The detected 2D planar motion was that of the house. The 
planar parallax motion was computed after 2D registration of the three images 
with respect to the house (FIG. lOd). A single point on the road-sign was 

15 selected as a point of reference (FIG. lOe). FIG. lOf displays the measure of 
inconsistency of each point in the image with respect to the selected road-sign 
point. Bright regions indicate large values when applying the constraint of Eq. 
(13) (i.e., violations in 3D rigidity detected over three frames with respect to the 
road-sign point). The region which was detected as moving 3D -inconsistently 

20 with respect to the road-sign point corresponds to the car. Regions close to the 
image boundary were ignored. All other regions of the image were detected as 
moving 3D -consistently with the road-sign point. Therefore, assuming an 
uncalibrated camera, this method provides a mechanism for segmenting all non- 
zero residual motion vectors (after 2D planar stabilization) into groups moving 

25 consistently (in the 3D sense). 

FIG. lla-f shows another example of using the rigidity constraint of Eq. 
(13) to detect 3D inconsistencies. In this sequence, the camera is mounted on a 
heliocopter flying from left to right, inducing some parallax motion (of different 
magnitudes) on the house-roof and trees (bottom of the image), and on the 

30 electricity poles (by the road). Three cars move independently on the road. The 
detected 2D planar motion was that of the ground surface (FIG. 1 Id). A single 
point was selected on a tree as a point of reference (FIG. lie). FIG. 1 If displays 
the measure of inconsistency of each point in the image with respect to the 
selected reference point. Bright regions indicate 3D-inconsistency detected over 



WO 97/35161 



PCT/US97/02115 



-18- 



three frames. The three cars were detected as moving inconsistently with the 
selected tree point. Regions close to the image boundary were ingored. All other 
image regions were detected as moving consistenly with the selected tree point. 
In the prior art, a rigidity constraint between three frames in the form of 

5 a trilinear tensor has been presented using regular image displacements. 

However, it requires having a collection of a set of image points which is known 
a priori to belong to the single 3D moving object. Selecting an inconsistent set of 
points leads to an erroneous tensor, and, hence, false moving object detection. 
The ability of the parallax rigidity constraint of the present invention to 

10 detect 3D-inconsistency with respect to a single point, provides a natural way to 
bridge between 2D algorithms (which assume that any 2D motion different than 
the planar motion is an independently moving object), and 3D algorithms (which 
rely on having prior knowledge of a consistent set of points, or alternatively, 
dense parallax data). 

15 

4. New View Generation 
This section describes an approach based on the parallax rigidity 
constraint for generating novel views using a set of "model" views. 

Methods for generating new views based on recovering epipolar geometry 
20 are likely to be more noise sensitive than methods that generate the new view 
based on 2D information alone, Le., without going from 2D through a 3D 
medium in order to reproject the information once again onto a new 2D image 
plane (the virtual view). The approach described below for new view generation 
does not require any epipolar geometry or shape estimation. 
25 Given two "model" frames, planar parallax motion can be computed for all 

image points between the first (reference) frame and the second frame. An 
image point with non-zero parallax is selected, and a "virtual" parallax vector is 
defin ed for that point from the reference frame to the "virtual" frame to be 
generated. The rigidity constraint (Eq. 13) then specifies a single constraint on 
30 the virtual parallax motion of all other points from the reference frame to the 
virtual frame. Since each 2D parallax vector has two components (i.e., two 
unknowns), at least two "virtual" parallax vectors are needed to be specified in 
order to solve for all other virtual parallax vectors. Once the virtual parallax 
vectors are computed, the new virtual view can be created by warping the 
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reference image twice: First, warping each image point by its computed virtual parallax. 
Then, globally warping the entire frame with a 2D virtual planar motion for the virtual 
tomography. 

Note that two virtual parallax vectors may not provide sufficient constraints for some 
5 image points. This is due to unfavorable location of those points in the image plane with 
respect to the two selected reference points and their parallax vectors. However, other 
image points, for whom the constraint was robust and sufficient to produce reliable virtual 
parallax, can be used (once their virtual parallax has been computed) as additional points to 
reliably constrain the virtual parallax of the singular points. 

10 D. The Geneinlirarl Parallax r^i^itf nf 

In this section, how the pairwise-parallax constraint (Eqs. (11, 12,13,and 14)) can be 
extended to handle full image motion (as opposed to parallax motion), even when the 
tomography is unknown, is described. This is useful for handling scenes that do not contain 
a physical planar surface. A form of a generalized parallax constraint between two frames in 
1 5 terms of the unknown tomography parameters and the relative projective structure of pairs 
of points is described. 

Eqs. (1) and (2) can be unified into a single form as: 

a*p a*p (15) 

where r = = y -j (see notations above). A is an unknown tomography from frame 2 to 
20 the first frame. It could relate to any planar surface in the scene, in particular a virtual 
plane. Importantly, before factoring out the structure, we first factor out the epipole, in a 
similar manner performed above. 

Let P x and P 2 be two scene points with homogeneous coordinates P\ **** P*. Then: 

T \ azpi azpx 



T 2 0-3P2 a 
Subtracting the two equations eliminates f 9 yielding: 



~2 aipi asp 2 



SUBSTITUTE SHEET (RULE 26) 



WO 97/35161 PCT/US97/02115 

- 20 - 



— fp, =*V) (p 2 irZr) - ■IzKtTD FIT)- 

t x KPx cnpx *i alp 7 aipx aipr (16) 

The equality (16) entails that the vectors on both sides of the equation are parallel, which 
leads to: 

1 AjP2 \ 1 ( A P) A PJ ) =o 

— (Pi It I ) ~ — \P* Ir _/ ) VTFU ITTJi 

Multiplying both sides by r * gives the generalized parallax constraint in terms the relative 
Ii. — £2 

projective structure - : 



(17) 



The generalized parallax constraint (17) is expressed in terms of the homography A , 
10 the image coordinates of a pair of points in two frames, and the relative projective structure 
of the two points. The generalized constraint does not involve the epipoles. 

The generalized parallax constraint suggests a new implicit representation of general 
2D image motion: Rather than looking at the representation of 2D image motion in terms 
of: homography plus epipole plus projective structure, it suggests an implicit representation 
15 of 2D image motion in terms of homography plus relative projective structure of pairs of 
points. Since this representation does not contain the epipole, it can be easily extended to 
multiple frames. 

Using Eq. (17), the invariant relative projective structure can be expressed 
explicitly in terms of the plane homography and the image positions of the two points in the 
20 two frames: W - jfeKfffr - 

Since the computations are invariant to new camera positions, it can be factored out 
over multiple frames (7 and k): 
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which leads 

(p. 




Eq. (19) is a rigidity constraint on a pair of points across three frames. Like the 
trilinear tensor of the prior art, it involves the parameters of two homographies across three 
frames. Unlike the trilinear tensor, it does not contain the epipole, but instead, is expressed 
on pairs of points. 

The trilinear constraints are based on a initial reference points, and any additional 
point adds four linearly independent equations to constrain the unknowns of the tensor 
(which are combinations of the homography parameters and the epipoles). 

In the generalized parallax rigidity constraint, the basis is a pair of points. Here also, 
any additional point adds four linearly independent rigidity constraints. These can be 
derived through factoring out Tz from Eq. (16) with the additional third point (still within a 
pair of frames) to form the four linearly independent equations over three frames. 

Although various embodiments which incorporate the teachings of the present 
invention have been shown and described in detail herein, those skilled in the art can readily 
devise many other varied embodiments that still incorporate these teachings. 
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We Claim: 

1 1 . Method for image processing comprising the steps of: 

2 a) receiving a plurality of two dimensional images representative of a scene; 

3 b) computing a parallax-related constraint for a pair of points within the plurality of 

4 images; 

5 c) applying the parallax-related constraint to a plurality of points within the plurality 

6 of images in order to generate information representative of whether a given point within 

7 the plurality of images is consistent with the parallax-related constraint; and 

8 d) using the generated information for an image processing task related to the 

9 received plurality of images. 

1 2. Method for image processing to detect a moving object comprising the steps of: 

2 a) receiving a plurality of two dimensional images representative of a scene; 

3 b) computing a parallax rigidity constraint for at least a pair of points within the 

4 plurality of images; 

5 c) applying the parallax rigidity constraint to a plurality of points within the 

6 plurality of images in order to generate information representative of whether a given point 

7 within the plurality of images is consistent with the parallax rigidity constraint; and 

8 d) using the generated information to perform moving object detection related to the 

9 received plurality of images. 

1 3. The method of claim 2, wherein the parallax rigidity constraint derives from 

2 position information of the pair of points and parallax vectors of the pair of points for three 

3 images. 

1 4. The method of claim 2, wherein the parallax rigidity constraint derives from 

2 position information of at least three points and parallax vectors of the at least three points 

3 for two images. 

1 5. Method for image processing to recover scene structure comprising the steps of: 

2 a) receiving a plurality of two dimensional images representative of a scene; 
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3 b) computing a parallax structure constraint for a pair of points within the plurality 

4 of images; 

5 c) applying the parallax structure constraint to a plurality of points within the 

6 plurality of images in order to generate information representative of whether a given point 

7 within the plurality of images is consistent with the parallax structure constraint; and 

8 d) using the generated information to perform recovery of scene structure related to 

9 the received plurality of images. 

1 6. Method for image processing to generaie new views comprising the steps of: 

2 a) receiving a plurality of two dimensional images representative of a scene; 

3 b) computing a parallax rigidity constraint for a pair of points within the plurality of 

4 images; 

5 c) applying the parallax rigidity constraint to a plurality of points within the 

6 plurality of images in order to generate information representative of whether a given point 

7 within the plurality of images is consistent with the parallax rigidity constraint; and 

8 d) using the generated information to perform new view generation related to and 

9 based on the received plurality of images. 

1 7. The method of claim 6, wherein the parallax rigidity constraint derives from 

2 position information of the pair of points and parallax vectors of the pair of points for three 

3 images. 

1 8. The method of claim 6, wherein the parallax rigidity constraint derives from 

2 position information of at least three points and parallax vectors of the at least three points 

3 for two images. 

1 9. Apparatus for image processing comprising: 

2 a source of two dimensional images representative of a scene; 

3 a computer processor for processing the two dimensional images including: 

4 means for receiving a plurality of two dimensional images representative of 

5 a scene; 
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6 means for computing a parallax-related constraint for a pair of points within 

7 the plurality of images; 

8 means for applying the parallax-related constraint to a plurality of points 

9 within the plurality of images in order to generate information representative of whether a 
10 given point within the plurality of images is consistent with the parallax-related constraint; 
! j means for using the generated information for an image processing task 

12 related to the received plurality of images and generating an output related thereto; 

13 an output device for presenting the output of the image processing task. 

1 10. The apparatus of claim 9, wherein the source of images includes a video 

2 camera. 

1 11. A computer-readable medium having stored thereon a plurality of instructions, 

2 the plurality of instructions including instructions which, when executed by a processor, 

3 cause the processor to perform the steps of: 

4 a) receiving, in digital form, a plurality of two dimensional images representative of 

5 a scene; 

6 b) computing a parallax-related constraint for a pair of points within the plurality of 

7 images; 

8 c) applying the parallax-related constraint to a plurality of points within the plurality 

9 of images in order to generate information representative of whether a given point within 

10 the plurality of images is consistent with the parallax-related constraint; and 

1 1 d) using the generated information for an image processing task related to the 

12 received plurality of images. 



1 

2 
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AMENDED CLAIMS 

[received by the International Bureau on 2 July (02.07.97); 
original claims 1-11 replaced by new claims 1-17 (5 pages)] 

1 . A method for image processing comprising the steps of: 

a) receiving a plurality of two dimensional images representative of a scene; 

b) computing a parallax related constraint for a pair of points within the plurality of 
usages, said parallax related constraint being independent of any epipolar geometry wh,ch 

5 may be defined for the pair of points; 

6 c) applying the parallax related constraint to a plurality of points within the plurality of 
° images in order to generate information representative of whether a given point wUhm the 

plurality of images is consistent with the parallax related constraint; and 

d) using the generated information for an image processing task related to the received 

10 plurality of images. 
! 2. A method for image processing to detect moving objects comprising the steps of. 

a) receiving a plurality of two dimensional images representative of a scene; 

b) computing a parallax rigidity constraint for at least a pair of points within the 
, plurality of images said parallax rigidity constraint being independent of any ep.polar 

5 geometry which may be defined for the pair of points; 

6 " C ) applying the parallax constraint to a plurality of points withm the plurality of images 

7 in order to ge-te information representative of whether a given point within the plurality of 

8 images is consistent or inconsistent with the parallax constraint; and 

9 d) using the generated information to perform moving object detection related to the 
10 received plurality of images. 

, 3 The method of claim 2. wherein the parallax rigidity constrain, is derived from 

2 position information of me pair of points tmd parallax vecrors of the pair of pomrs generated 

3 from three of the received plurality of images. 

, 4 The method of claim 2, wherein the parallax rigidity constraint is derived from 

2 position information of at least three points and parallax vectors of the at least three pomts 

3 generated from two of the received plurality of images. 

1 5. The method of claim 2 wherein: 

2 step (c) further comprises the steps of: 

3 (cl) applying a two dimensional transformation to said plurality of images to align 

4 regions of said plurality of images and identify a plurality of misaligned regions of said 

5 plurality of images; 
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6 (c2) segmenting and aligning said identified regions to produce segmented regions; and 

7 step (d) furthers comprises the step of 

8 (dl) iteratively aligning and segmenting the segmented regions until a remaining region 

9 fulfills a criteria that identifies the remaining region as a residual motion within said plurality 
10 of images. 

1 6. The method of claim 5 wherein the criteria fulfilled by the remaining region 

2 includes inconsistency with the parallax rigidity constraint. 

1 7. The method of claim 2 wherein: 

2 step (b) comprises the steps of: 

3 (bl) applying a two dimensional transformation to said plurality of images to 

4 align regions of said plurality of images and identify a plurality of misaligned regions 

5 of said plurality of images; and 

6 step (c) comprises the steps of: 

7 (cl) segmenting said misaligned regions to produce segmented regions; 

8 (c2) iteratively aligning and segmenting the misaligned regions until a 

9 remaining misaligned region fulfills a criteria that identifies the remaining misaligned 

10 region as motion within said plurality of images. 

1 8. A method for image processing to recover scene structure comprising the steps of: 

2 a) receiving a plurality of two dimensional images representative of a scene; 

3 b) computing a parallax structure constraint for a pair of points within the plurality of 

4 images said parallax structure constraint being independent of any epipolar geometry which 

5 may be defined for the pair of points; 

6 c) applying the parallax structure constraint to a plurality of points within the plurality 

7 of images in order to generate information representative of whether a given point within the 

8 plurality of images is consistent with the parallax structure constraint; and; 

9 d) using the generated information to perform recovery of scene structure related to the 

10 received plurality of images. 

1 9. A method for image processing to generate new views comprising the steps of: 

2 a) receiving a plurality of two dimensional images representative of a scene; 

3 b) computing a planar parallax motion trajectory for a subset of points within the 

4 plurality of images to generate information representative of relative parallax motion among 
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the subset points said planar parallax motion trajectory being independent of any epipolar 
geometry which may be defined for the subset of points; 

c) selecting at least two points from the subset of points having non-zero parallax 
motion; and 

d) defining respective virtual parallax motion vectors for each of the selected points, 
the virtual parallax motion vectors representing respective locations of the selected points in a 
virtual scene; 

e) applying a parallax rigidity constraint to at least the subset of points within the 
plurality of images with reference to the defined parallax motion vectors to generate a parallax 
motion vector for each of the subject of points; 

f) warping each point in the plurality of images having a parallax motion vector by 
that parallax motion vector to generate an intermediate virtual scene; and 

g) globally warping the intermediate virtual scene with a 2-D virtual planar motion 
corresponding to the virtual scene to produce the virtual scene. 

10. The method of claim 9, wherein the parallax rigidity constraint is derived from 
position information of the at least two points and the parallax vectors of the at least two 
points generated from three of the received plurality of images. 

1 1 . The method of claim 9, wherein the parallax rigidity constraint is derived from 
position information of at least three points and parallax vectors of the at least three points 
generated from two of the received plurality of images. 

12. Apparatus for image processing comprising: 

a source of two dimensional images representative of a scene; 

a computer processor for processing the two dimensional images including: 

(a) means for receiving a plurality of two dimensional images representative of 
a scene; 

(b) means for applying a parallax-related constraint to a plurality of points 
within the plurality of images to generate information representative of whether a given 
point within the plurality of images is consistent with the parallax-related constraint, 
said parallax-related constraint being independent of any epipolar geometry which may 
be defined for the plurality of points; 
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j j ( C ) means for applying a parallax constraint to a plurality of points within the 

12 plurality of images in order to generate information representative of whether a given 

13 point within the plurality of images is consistent with the parallax-related constraint; 

14 (d) means for using the generated information for an image processing task 

15 related to the received plurality of images and generating an output signal related 

16 thereto; 

17 (e) an output device for presenting the output signal of the image processing 

18 task. 

1 13. The apparatus of claim 12, wherein the source of images includes a video camera. 

1 14. Apparatus for detecting object motion within a sequence of images representing a 

2 scene comprising: 

3 means for selecting a plurality of images from said sequence of images; 

4 means for applying a parallax constraint to said plurality of images to align regions of 

5 said plurality of images and identify a misaligned region of said plurality of images; 

6 means for identifying parallax motion within said misaligned region; and 

7 means for removing parallax motion from the misaligned region to detect a moving 

8 object within the scene. 

1 15. The apparatus of claim 14 wherein said means for removing parallax motion 

2 further comprises means for identifying image components that are moving consistently with 

3 respect to a parallax motion constraint to identify parallax generated motion within residual 

4 motion in the scene. 

1 16. A computer-readable medium having stored thereon a plurality of instructions, th< 

2 plurality of instructions including instructions which, when executed by a processor, cause th. 

3 processor to perform the steps of: 

4 a) receiving, in digital form, a plurality of two dimensional images representative of a 

5 scene; 

6 b) computing a parallax-related constraint for a pair of points within the plurality of 

7 images said parallax-related constraint being independent of any epipolar geometry which ma 

8 be defined for the pair of points; 

9 c) applying the parallax-related constraint to a plurality of points within The plurality 

10 of images in order to generate information representative of whether a given point within the 

1 1 plurality of images is consistent with the parallax-related constraint; and 
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d) using the generated information for an image processing task related to the received 
plurality of images. 

17. A computer readable medium having stored thereon a plurality of instructions 
including instructions which, when executed by a processor, cause the processor to perform 
the steps of: 

(a) selecting a plurality of images from said sequence of images; 

(b) applying a two dimensional transformation to said plurality of images to align 
stationary regions of said plurality of images and identify a misaligned region of said plurality 
of images; 

(c) determining residual motion within said misaligned region; and 

(d) removing parallax motion from said residual motion to detect a moving object 
within the scene. 
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